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SQL-BASED ANALYTIC ALGORITHMS 

CROSS-REFER FNCF TO RELATED AP PLICATIONS 
This application claims the benefit under 35 U.S.C. Section 119(e) of the co- 
pending and commonly-assigned U.S. provisional patent application Serial No. 
60/102,831, filed October 2, 1998, by Timothy E. Miller, Brian D. Tate, James D. 
Hildreth, Miriam H. Herman, Todd M. Brye, and James E. Pricer, entided 
Teradata Scalable Discovery, which application is incorporated by reference herein. 

This application is also related to the following co-pending and commonly- 
assigned utility patent applications: 

Application Serial No. --/---,---, filed on same date herewith, by 
Brian D. Tate, James E. Pricer, Tej Anand, and Randy G. Kerber, entitled 
SQL-Based Analytic Algorithm for Association, attorney's docket number 
8219, 

Application Serial No. --/---,---, filed on same date herewith, by 
James D. Hildreth, entitled SQL-Based Analytic Algorithm for Clustering, 
attorney's docket number 8220, 

Application Serial No. --/---,---, filed on same date herewith, by 
Todd M. Brye, entitled SQL-Based Analytic Algorithm for Rule Induction, 
attorney's docket number 8221, 

Application Serial No. --/---,---, filed on same date herewith, by 
Brian D. Tate, entitled SQL-Based Automated Histogram Bin Data 
Derivation Assist, attorney' s'Uock'et number 8222, 

' Application' Serial Nov --/> *•>», -~ filed on same date herewith, by 
Brian D. Tate, entitled: SQL-Based Automated, Adaptive, Histogram Bin 
Data Description Assist, attorney's docket number 8223, i ; 

ApplicationSerial fcjb. PCT/US99/ r - : - -. filed on same date 
herewith, by Timothy; E. filler, Brian D. Tate; Mirjam H. Herman, Todd 
: : M. Brye, and Anthony. L. Rollins, entitled D#a Mining Assists in a 
; Relational Database Management System, a%rney's : jdocket number 8224, 
• < ; Application Serial No. --/---,-- -, filed on same date herewith^ by 
r •«' Todd M. Brye, Brian D. "late, and Anthony L> Rollins, entided.SQL-Basld 
< ; Data Reduction 'Techniques for Delivering Data to Analytic Tools^ . ^; 
attorney's docket number '8225, ,t ; . < - . 
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< Application Serial No'. PCT/US99/ - - - ; -, filed on same date 
herewith; by Timothy E„ Miller, Miriam H, Herman, and Anthony L. 
: Rollins; entitled Techniques for DeployingAnaJytic Models in Parallel, 
•attorney's docket number 82^6,. and^; ■■ A t i A y t *, r - , -.. 
' - . r: >>■„.:■> >. A^plicarion-Seri4.Np,vPPT^5^f^r "rftjM °P sam f ^ te 
. y-i r,. ; - herewith, by Timothy E."-Miller> B^^^D v T^*-^^;■^^^^ 1 P n y ^Jf" 18 ' 

;o: ■ entitled Analytic Logical-Data ^9^■^mfti&#¥?%■Pg ^ ^i**& * 
2c \ -I,.: ; i a]] bfwhich are' incorporated by reference^ herein.. . ; ; . , , . . . . . r c 

. y.j : . .i, v> R A ^f^OT-IND OP TK R INVENTION V. , ^ 

'■ '^-'i:? VFiield of the Invention-.oi- ji,:: if - :,r< .; .. > . - ; • ; h . 

■ u. yr. : ^^invention rektesdn general ,td i a relational database management 
system, and in particufctfto SQL-based^ analytic aigqrithms that provide statistical 
and machine lear^mg metHods tO:Create,analytic models irom the data residing in a 
•'tektt6nal , database..'-' ;r '''' , ' ; -': • c---.^*n >.v .v.u •. :• ;,<\ .. , ••. .-....,! 

: .. 2: s ' Description 6f Related. Art. , f v , - . . ( ,, 

H 1 Relational databases are the.predonnnate form of database management 
systems Used in computer systems.; Relational database management systems are 
often used ih ; so-called rt data warehouse" : apRfications where enormous amounts of 
data are stored ! and processed. In .recent years, several trends < have converged to 
create a new class of data warehousing applications known as data mining 
; application^: Data mining is the process of identifying .and. interpreting patterns in 
databases, ; and ean be [ generalized into three stages. . r . ,-. , ; r , - 
■ . ; *« Stage one is the reporting stage, which- analyzes.the data to determine what 
' • Happened. GerieraUy,- most data warehouse implementations start wkl? . a focused 
application in a specific functional area of the business. ,These ia ppUc^tions usually 
focus on reporting historical snap shots of business mformapon^at was previously 
difficult brlmpossible to access. Examples include Sales Revenue Reporting, 
Production- Reporting and mvento'ry.Reporting tonameafew. b r 

Stage two is the analyzing stage, which analyzes the data to determine why 
it happened. A^stage^one erid-users g4plpreyi?usj y unseen views of their business, 
°trfey cpiiclclyVeek to Understand why ^rt^l^n^Qcan^fqr.c^ple a decline 
in sales revenue: After discovering a reported decline, in ^«fe { ^a warehouse users 
Will then ? 6bviously ask, "Why did sales go down?" . Learning th? answer to this 
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question typically involves'proDirig the database through an iterative series of ad 
hoc or multidimensional queries until the root cause of the condition is discovered. 
Examples include Sales Analysis, Inventory Analysis or Production Analysis. 
Stage three is the predittmg'stage, Which tries to determine what will 

5 happen. As s^e two usere ^ to extend their 

analysis t6 mcmde' ptemction ^ end-users 
are likely to buy^' particular product^ ^'^^%is.St>risfe^4^|BS|or the 
competition?" It is diffiauifdr humans Wsee op in 
data, hence as data warehouse users evolve to sophisticated predictive analysis they 

10 soon reach the limits of traditional queryl aMJepd^n&tools. Data mining helps . 
end-users break through these Urmtations % leveragmginte£ligent software tools to 
shift some of the analysis Burden from th'e ihumanrto thei machine . enabling the 
discovery of relationships that were pfevioiksly unknown. ,. ' , . ... 

' Many data m^ • ' 

15 solutions to complete tool suites. Most of these technologies* however, are. used in l 
a desktop environment where little data is captured and maintained. Therefore, 
most data mining tools are used to' analyze small data samples, which were gathered 
from various sources into proprietary cfata structures; or flat files. [On the other 
hand, organizations are begrhning^ to amass very large databases.and end-users are * 

20 askmgmo'rW complex questions »re^ r 
Unfortunately, most data niining-techriologies cannotir^,used,wit^h:large ^ 
volumes of data. Further; most analytical techniques [used m data mining are 
algomhmic-ba^edtather ma^ data^riveni and as such; there are, currently, little 
synergy between data mining and 'data warehouses; Moreover, from a usability 

25 V perspective,' traditional data mining techniques are too complex for.use by database 
administrators and appfo^ 

different mdustry or a- cUffereht'c^stdmervi ' ^.v .l . 
•-i ■ ■< tfafy id a need -in the ari4or 5 datai jninmg^ppUcations.that directly 
operate^ f agairrst data warehouses, and thai allow non-statisticians to benefit from 
30 advanced mathematical techniques available in a -relational environment. 
" ''' ' '''' '" u '• •■ • t £ ■ /!„;• « 

- ! ' - v? - '■ - \ STTMMARY OF:THF INVENTION ; ■ 
i i •! <;.:.. &:i T^ 0 ^J^nre^rHe-l£niiatibttS ipirhe prior art described aboye t andito 
w -' JC ^iri6M Y 6^HBa^<^ &a*-wiu become apparent upon reading and..: 
35 understanmng the present specification, the present invention discloses, a method, 
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apparatus, and article of manixfacture for performing data mining applications in a 
■, relational database , management system., At least one analytic algorithm is 

performed by a computer directly against 4 relational database, wherein the analytic 

algorithm includes SQL statements^rforaie4 by the relational database 
5 i A management ^system and optiomd prog^ an4 the analytic 

algorithm creates at least one, analyse mpddr^tW^an jnalytic logical data model 

from data residing in ^:the,rel^upn^,database. _ _ ; r 4 s 

An object of theupr^e$ti^ efficient .wngeof 

parallel processor computer , systems. , An pbjpct of the presen^inventioh is to 
10 provide a foundation for j data, mining |pc>l sets in >eknpn^database management 

systems. Further, an objett of. the .pre^egg ^nyemiqn t is to allo^r data mining of large 

databases. ;-?)cn 7 -A; \ ^ . ■ 



BRIEF DESCRIPTION OF THE DRAWINGS 
15 : . 2; -Referring (now: to thedrawings in which like reference numbers represent 
c6t responding, parts throughout:. , s . Vj ^ ; : . ; . 

FIG. 1 is a blockjdiagigm that;illus):r^tes an exemplary computer hardware 
environment that could be f used.withjthe preferred embodiment of the present 

invention;, V., ■>:.;■,-.,... 31.. v * //■//. i;v 

20 FIG. 2 is a block diagram that illustrates an exemplary logical architecture 

; , that could be used ^vith the .preferred embodiment of the present invention; and 
FIGS. 3, 4, and 5 are flowcharts that illustrate exemplary logic performed 

according to the preferred embodiment ,qf jhe present invention. 

25 >.TT>FT A TT*.FD DESCRIPTION OF THE PREFERRED EMBODIMENT 

In the following description o^th^e preferred embodiment, reference is made 
to' the accompanying drawings^ which form a part hereof, and in which is shown 
by way of illustration ^ £ i specific i embpdipaefit in which the invention may be 
<. practiced. It; is to be, understood that other^embodiments may be utilized and 
30 -i structural changesmay-bfe made without, dep^mngfrom the scope of the present 
-v ?/^inventibn... h<-^x^ l j ^M ~i --.r-. , . f-.-i. 

OVERVIEW t '^'~* 
The present invention provides a relational database management system 
35 (RDBMS) that supports data mining operations of relational databases. In essence, 
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advanced analytic processing capabilities 3 for data mining applications are placed 
where they belong, i.e., close to the data\ Moreover, the results of these analytic 
'proces^capabairies"Wbe'%s^e to persist withiri the database- or: can be exported 
from the database." fh^e analytic prbc 
5 exposed external^ 'the ]^BMS;by *h a^pHc^n-prdg^nnttble interface (API). 
Accordmg to the pre^ 
. iterative approach referred to as a *I^6^1^VE>£^V^AWy^oBroW-.v \ 
(Kt>AP). There are six major taskl' witmn the 3 ^!^?: ^ 5 *° ^ 
• ■:. . " understariding tne s tus-mWdbj^ctive. "^-.co — "• "■ 

10 . 2. UnderstanalngVHe 5 source dM available.' 1 r.'^.-cnvoi » 1 - : " : " 

'' 1 3. ' Ida^^&KiM ^-"p^i^oce^g^-lthe-d^i-''* s. iv ;, 

4. Designing the analytic model. 

5. Creating and testing the models. 

6. Deploying the 5mt^ic'm^B^^- : ^---"^-'--- ,, -« 

15 The present invention provides various components for' addressing these tasks: 

• An RDBMS that executes Structured Query Language (SQL) j 
statements against a i^el^iMiad'dat3tbase. H - - > • * : - - 

• An analytic Appifca&ori Programing Interface (iAPI) that creates 
scalable data mining functions comprised of complex SQL 

2p statements. 

' • 1 ; XppUc^ion pro^ 

• API. ' ' - ■ ' - " •' ' " ■ 

• Analytic al^oritttos ; utilizikig: - ; i % : a ^ . ; s J . 
■ Extended ANSI SQL statements, 

25 ; - ^ ^CallLeWl^^ 

' " "'' ' 'and prbg?dfaimati'c itierati6rii-arid~- ^ ^ i 

r : V a Data Reducdbn ^ UtUity Program comprised of SQL 

" 1 statement^ 'aind prbgrammatic iter^ion. v 

J # An analytical logical dati mbdel (LDM) that stores results from and 

30 ' i n 'f 0 rmatWa^^ 

• A parallel deployer that controls parallel execution ofcthe results of 
the analytic algorithms that are stored in the analytic logical data 
model. 
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The benefits of -'the 'present invention include: : .,. ,, . , . : 

• Data mining of very large databases, directly within a relational 
■ ' "<•- V database. ".?V:,. . s J.:-' -v ;. .Iri: -• ..-'7 

>V'- "^Management pf 'analyticiieiubs^thin a,relalional database.^ 
5 c. - ' .^t - i. Atcbmprehensiveset 4ofw?J7mo.j«^oj^,that operate within a 
?:<•:<;; - -.-.i .rk :.; re l a tib' n 'al database managementr^steini ri r., .! -..,-*..;.'•,..■ 
i^fsqf.io-, -.i^o o.; , Applicati&nintegratrcn.through.dn pbje«^riented AjPL . . . ; 
These components and benefits are described m more detail below.,- .... . 

10 -3 ?J.* " HARDWARE ^TWTRONMENT 

V? . ? • i : : ' FlG;;i; i s ; a block magrain thatimustrates^afreserKplary computer hardware 
■ ehvirbnment-tKat could be used witHthe prefer^ed embodiment of the present 
; - 1 mveritibri. ; tethe exemplary computer hardware; environment, a massively parallel 
^ processing (MPP) computer systemrlOO is comprised-of onepr, more processors or 
15 nodes 102 interconnected by a network 104. Each of the nodes. 102 is comprised of 
" one or inore proce^r£todom?ab^smemory (RAM), read-only memory 

(ROM), ahd other components- It,is;envisi'oned that.attached to the nodes 102 may 
be brie or more fixed'and/or remo zable ctata storage units (DSUs) 106 and one or 
more , datWcommuriications units {DCUs);108*as is well known in the art. 
20 Each of the nbdesJl02 executes one or more' computer programs, such as a 

bata Mining Application-(APPL.) 110 performing data, mining operations, 
Advanced Analytic Processing Components (AAPG) 112 for providing advanced 
analytic processing capabilities for the data mining operations, and/ or a Relational 
Database'Manageriierit Systein (RDBMS) 114 for. managing-a relational database 116 
25' stored on-one or more of the DSUs 106 for use in thedata mining appUcations, 
wherein various operations are performed inthe APPL 1 10, AAPC 112, and/ or 
r RDBMS 114 in response to comhiaridsTfrom one or more Clients 118. In 

alternative embodiments; the~APPL>l 10 may beexecuted in one or more of the 
Clients 118', or on ^application: server bn a!different platform attached to the 
r '30 ' ■ network 104.'-" ;< '-' u ? ■' ' ' f' • ■ ^ 

\: \ : 0 v 'Generallyi the coriiputer programs are^angibly embodied in and/ or 
° 1 'retrieved from RAM;cROM, brie 1 or more of the DSUs 106, and/or a remote device 
• rb ;ivr £oupled ? to the^cbm'puter^stem 4*0 viaione .ocinQre of the DCUs 108. The 
1 1 1 computer progf ami-comp^ read and executed by a 
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node 102, causes the node 102 to perform the steps necessary to execute the steps or 
dements of the present rnvfentibfo- : v. Mr r c.^C 

Those skilled in the art will recognize that the. exemplary environment 
illustrated in FIG; 5 1 is not intended to liniit the present invention. Indeed, those 
5 : skilled in the art will recognize that other alternative hardware environments may 
be used without departing frbrfi-the .scope of tdbe ?f e^ent inyention. In addition, it 
should be uhdefitBod that the pr^ent /invectionLmay ?1sq ragply to other computer 
programs than those disclosed 1 herein^ : 1> 's:m 'K$tzz\z j v,;r -j- ;^^r " 

10 : • i b: I > £xmmMIXK CMiTTEGTURE - : 

FIG. 2 is a block diagram thatiillustrat^ ^e^i^laryilogiq^architecture of 
r the AAPC 112/ and its interaction withithe ABBL 1 JO,. JtDBMS, 1 14 :; relational 
^ database 116, afid Client 1 18, according to the preferred embodiment oi the present 

invention. In the ; preferred embodiment; the AAPG 112 included the follpw;ing :i , 
15 cbinpoh^nts: r ^ - ' ? • •-..•n-.Q .: r d : i. . -'-r ^- ?• J \{"! -> J ^ n t 

• An Analytic Logical Data Model (LDM) 200...th?t stores results from $ 
' ■ : 'thi" advah'Gfed"sicalytic processing- .in tbe.RPBMS>l!4 t v c j 

• I— One or more Scldahfo Data Minm^ 

coniprise^complex; optimized SQL; statements that perfo * 

20 I advanced analytic; processing in; the RDBMS 114, ^ ? 

• 1 ? An Analytic Application Prognumning Jn^^ 204 that > 
' : : -~ r 1 provides & mechanism for, an APPL 110 or. other component to 

■ . i ir. i nV oke the Scalable;Data£ Mining Functions 202, . : , , ; r ; 

- f ^ : One or more Analytic Aigdrith^ 
25 t v'-^ applications or can ^be/invoked by a 

j Analytic Algorithms 206 comprise: , r - 
> * vr . . ■ . ii > : t Emended ANSI SQL 208 that can be used to implement a 
j :aor ■ u - certain class tof Analytic Algorithms 206, . ( yi 

— ' ^ ^ r ^ ; : . » < A G ALevd Interface (GLI) ^10 that can be. used when a 
30 combination of SQL and programmatic iteration is.required 

i ; y, J to impkment a.fertain class of Analytic Algorithms 206, and 
> * . / xr ' * . A Data Reduction Utility Prpgr^m: 212.tl;at can be lased to 

io; s7„X I - o ) t / itnplement.ardertaia class pf Analytic ^gonth^,2Q6t where 
r v ' j i* J i.' 7* ■ ^ '*=. *' * .'. * ; i i, ; rr. . datsd is/first reduced using SQL follpwed by programmatic 
35 iteration. 
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• An Analytic Alg^ 

- .: — 214 that* provides a mechanism foran APPI-.110 or other . 
v . components to invoke the Analytic Algorithms 206, t . ; v , 

• A Parallel Deployer 216 that controls parallel executio^s.of the 
Vr ( . . j ~, results of an-Analyyio Algoridi^ 206 (sometimes. referred to as an 

: 1 > s:;analytic model):thaj; ar^ stored in the Analytic LDM 200^ wherein 

: ^the^resultsfolexecuitinf thetParaUelDeployer ; 2J16 : are ,?tpred ir io. the 

■.''. l r. so { ...Ci i»r»9DBM$U4.; I.M ^fry;' :::^->otq .>,-.-,;[?.*• • -.•} ...... -m; , 

j" .; ■ ■ '.,N6te that the uiseiof these various .comp;pnents is optional, aftd^hus only 
10 some of the components may betusedj incanj: particular configuration.^ , - - 
■■■■.<■ v .Tl^vpr^eiTed.emhdxliirien% it^trienqeftl towards a multi-tier.lpgical 
^architecture, in which a. Client: M 8 .interacts with -the various components described 
above, which/in tumrifiter£ace^©.>*(e.WBMS4143p : utUi^ a Urge central 
repository ©^enterprise data stored ^^ ip^the ^elauQnal database 116 for analytic 

15 ; processings ■■ \ - ■ ; :'J '•:><. i'-.r.'-a.'. .::« blu.i.'. li . o :, . ; -i. .' • . • :. 

In oneiexaniple, a Client. 14* interacts with an APPL 110, which jnterf aces 
to the Analytic API 204 to invoke one or more of the Scalable Data Mining 
Functions 202, which are executed by theTOBMS 114. The results from the 
execution of the Scalable Data Mining Functions -202 would be stored as an analytic 

"20 model within an Analytic LDM 200 in ths RDBMS ,114. . » ? . 

■ ■ In another^example, a.Client:ll ^interacts with one or more Analytic 
Algorithms 206 either .directly.or via the Analytic Algorithm API 214.^ .The 
AnalyticiAlgorithms-206 comprise SQL statemerits.that may or .may not include 
programmatic iteration, and the SQL statements are executed by the RDBMS 114. 

25 In addition, the Analytic Algorithms .206 may, pit-may npt interface to the Analytic 
API 204 to invoke one or more of the Scalable Data Mining Functions 202, which 
arte executed by the RDBMS 114/v Regardless, t^e results from the execution of the 
Analytic Algorithms 206'would be.stored as x an analytic model within an Analytic 
LDM 200 iri the RDBMS il4:b a z .1 ./ir. - vb k u -.: , 

30 In yet another example, a Client 118 interacts with the Parallel Deployed 

216, which invokes parallel instances of the results o;f-«the Analytic Algorithms 206, 
,; r 'sometimes referkdto as an. Analytic Model./iThe. Analytic Model is stored in the 
Analytic IiDM'200 as a result of; executing a&instance oith^ Analytic Algorithms 

••."<•"• 206. The'results of executing the Parallel Deployer 216,ar^ stored in the RDBMS 
35 114. 
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In still aribther example, a Client 118 interacts with the APPL 1 10, which 
invokes one or more Analytic Algorithms 206 either directly or via the Analytic 
Algorithm API 2T4: The results would- be stored as>an analytic model within an 
Analytic LDM 200 in the RDBMS 1141 < ' ' ^ ~> ; : \ ' ^ A / 
5 ' The : overall gdal iis ! to si^iiifiicalitly irfiprove xhe perforraance, efficiency, and 
scalability of data mining o^ktions by pferfdrbung* compute and/or I/O intensive 
: operations in the various ^n^d^ achieves this not 

only through the parallelism provided by the MPP cotirpiitef System 100, but also 
" frbin reducing tl^ amourit'of datath^ fldws between therAPPL' 1 10, iAAPC 112, 
10 RDBMS 114V Client 14 85 and dth^-coMpdiienti/> r; fe^^'i^^co oih r yrrr^ iv: 
' Those skilled' ih the art will receigni^ that the exemplaty corifigurations 
illustrated and discussed in cdnjunfctibtt x ^ith FIG. 2 ire not? intended; tojimit the 
present invention. Ihdeed; tlic/se skilled ik the art v will recognize |ihat x)ther j, 

alternative configurations *m£y -bemused A^ithoutJdeparting from the scope pf the a. 
15 present invention. In addition, it should be understood that the present invention 'n 
r : '*' may also apply to other components than ihbse cUsclosed herein*;iv ; 3 

- - : • " : - : ■'■ •- ' ' " '.5 o -r-j > ->:v c: i , :\. -n^-iJ- u ...... 

Scalable Data Mining Functions -: r : ( u *i:x .i «.■:, , ; 

The Scalable Data -Mining Junctions '2021 comprise, .complex, i 
20 optimized SQL statehients that are created, ih the/prefeyred embodiment, by % 
parameterizing and ihitiiKiatiri 204. The ^ 

Scalable Data Mining ^uhcti<)hs / 202 perform? much of theiadyanced analytic 
-processing for data iriifting applications* when perfOTmedibyl the RDBMS 
114, without having to mdve datk frcifti the relational -database^! : 
25 The Scalable Data Minirig"Fun&i^^ : ; 

following functions':' " !i e>i tr -~ - or " • * 5 -:> ': : -t; r I " : . \ 

; / ^ ^ f Data' Description : The ability* to understand and describe the 
Ui :i " - A available data using Statistical' techniques. For exmiple, the 

generation of descriptive statistics,^ frequencies a&d/]or histogram 
30- • ; • ' x t! bins^' ' - ^ ^''''r.rl'^KXDivJ'-^i \ . 

: ' , * v Data Derivation : The ability to eeneratene3v variables f v 

• ' (transformations) based 'upon existingcdetaikd data when, designing 

• ■ A r V^^^/^xi l t^^ie ^ al<>^d•5 iFotf jexampfejtthegeneratJQn.pf pr^ctive 

iV< \; ; :■: ;r ; v^abl^ its bitmaps, ranges, codes and mathematical functions. 
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• Data Reduction : The ability to reduce the number of variables 
(columns) or'^ 

" ' L 1 model. Fbr example, creating Gbyariance,, Correlation,, or Sum of 
: ^ Squires and Cross-Products (SSGP) Matrices. . ; - f: r ;. ivr r 
■ 1 ; - - ' : * Data Reorg^iiization : > TheAbility to join or denormali^ pre- : , 
- 'processed results v- 1 " ?/V^ t t r 

v .00: y, : v patrt Samplin^Partitioning : ; Ttoe abiHty,tam^ 
;>? j: s ^ i ^erenrd&a Saniples or data partitions. For.ex^ple^h^hjkta 
" ; - ^^^^partiticMoSn^ oriiata Sampling.-; rToj';, . )\ > ; I'-.vl/. !j v- - 5 
: : ; : <\..;x T ^ principd ttehie of the Scalable Data Mining Functions 202 is to 
facilitate analytic operations within the RDBMS 114, which process data collections 
stored in the database 116 and produce results that also are stored in the database 
116. Since data-ihiriiiig 6peratidns tend.to be iterative and exploratory, the database 
116 in the preferred embodiment comprises a combined storage , and work space 
envirbnnVeiit; : A$ such* a sequence of data mining operations is viewed as a set of 
steps tiiat start with sdme coUeGtioil of tables in the database 116, generate a series 
of intermediate work tables; and filially produce a.restilt table or view. ; f 
* ? */..t — ; oi :_. cj.'* ■„ \ 1 • 1.''}. i ■ . ; }t : . 

Analytic Algorithms - v 1 f ' ?n rv w /. v : < 

^ ^The Analytic : Algorith*ms 206 provide statistical and "machine learning" 
methods to create Analytic LDMs 200 from the data residing in the; relational 
database 116?" Analytic Algorithms 206 that are completely data driven, such as 
association^'' be-^ Extended ANSI SQL 208. Analytic 

Algorithms 206 that require a combination of SQL and programmatic. iteration, 
: ' i: such"'ds induction, can be implemented using the CLI210.. Finally, Analytic 
Algorithm^ 206 that require almost Complete programmatic iteration, such as 
clustering, can be implemented using a Data Reduction Utility r Program r 212, 
wherein this approach involves data pre-processing that reduces the amount of data 
that a non-SQL algorithm can then process. *<;. < . _ //J ! V 

r The Analytic Algorithiiis 206 isignificantly improve the performance and 
e'fficiency-of d^ta ; nn^g T ope^tti6ns by providing:the. technology, components to 
: pet?6rm'idvanced analytic operations directly against the-RDBMS 4 14. t In addition, 
^e J 'M^y^c : -AlgbriAms 206 leverage •the2par^el;^m;that,^ists in rtk^ MgP 
cbxriputer system 100^ the RDBMS 114* and the database 116; v 7f , : 
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' The Analytic Algorithms 206; provide data analysts with an unprecedented 
option to train arid apply ' -machine: leariimg" analyties against massive amounts of 
data in the relational database 116'. jPrior techniques have failed as their sequential 
design is not optimal in an Analytic 

5 Algorithms 206 are implemented ih Exrehded^Np SQL ^p8, through the CLI . 
210, and/or by means of ^feil^tatReductioiijlLJttli^^isgrw^a, they can 
therefore leverage'^ system 100. In 

' addition, taMrig a 1 data-df iveri approach to arialysis^thrcugh the use of complete 
Extended ANSI SQL 208, allows people other . than highly educated statisticians to 

10 leverage the advanced analytic techniques offered-by- t^i^nalyjic Algorithms 206, „ 
• •' •• 1 ' v - - : "' " . ■ * i' isAHQii -iff i r.hiu\? , uoJ.if; ; v..o :r. : \ ''r.xir: 'czs-r"''- .,<•! 

-• •' Extended ANSI SOL zfU is>~ s^bo-.q h v • >;; ^ ., :c > .. r 'i n ; r .. M „,. 

As" mentioned abbV6f Analytic Algorithms ?Q6, jtoWPPfPffatyfa* 
driven, such as affinity analysis, can be implemented, so>ly in Extended ANSI SQL 

15 208. Typically, these^typfe of algorithms bperate;against?a set^of tables in the .• 

relational database 116 that {Unpopulated with tranj»ctiqnvley>el data,, the source of 
which could be pomt-of-sale devices, autcariated teUer;^ 
Internet, etc. The SQL statements used to process this data typically build 
relationships between and among data elements in tables. Ecg example, the 

20 SQL statements used to process data from ;poin,t^of r sale / . devices may build 

relationships between arid^amorig products and pairs of products, Additionally, the 
dimension of time can be added in such a way jhat these relationships can be 
analyzed -to determine -h'ow4hey change oyer time. As^the, implementation is solely 
in SQL statements!^ the design takes advantage of the hardware and software 

25 environment ofthe'prefera 

into a plurality bf sort and merge steps that can be executed concurrently in parallel 

< by'the MPP co.mj>uter system 100;: C ;.>iii,,i b.a. ; . .,(/ 

-'* " : '• •■•'ii f •• > • • •;£*:• iwhj >::>li"<> :' ; . z ■■ r : ■; ■• ;■■ > 

Call-Level Interface ^/^rq ; A*v .1 .,vj\ 

30 As mentioned above* Analytic: Algorithms 206 t^at require z mix of 

^ ; prbgrammatic iteritidn, ^ongiwitfcExtended, Al^SI k SiQL,sta^m^nte, sudr as 
-indurti^ can^ be implemented vising the CLI 210. ^here^ the SQL 

approach 4s appropriated ^ e .de^riptive ^atm£, 

inference problems are predictive in nature and typically require a trainmg phase 
35 where the APPL 110 "learns" various rules based upon the data description, 
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followed by testing and application^and where the rules are vafidated.and applied 
against a new. data set. ; This class of algorithms, are compute-intensive and 
historically can not handle large volumes of data because they expect the analyzed 
data to be in a specific: fixed or yaria^ . t , 

r T . Most implementations ,firs^«^rac| 0 ^.^^om the database ll$to, 
cdnstruct.a flat, file! tikMtefc execute; jthe/gNua" pomon^on this resultant file. This 
.^methodlis slow andilimited by^e^owt.pim^o^; available in the computer 
"system 100. ;T^s proress^^ 116 
to perform those portions of the analysis, mstea^pf extracting a|l the data. 

When SQL statements and programmatic iteration are used together, the 
RDBMS 114 can be leveraged to perlorn^.epmpujappns and order data within the 
relational database 116, and-]d»ien.^ract^e.^^^ tt^ng;Very"iittle memory 
in the APPL :11Q;<. AdditionaUy^cpmjjutations,, aggregations and/or ordering can be 
"run in:parallel,. because* of the .n^assiyejy parallel nature of the. RDBMS 114. 

■ -'■ i: j DataReductipn Utility P ropram ,. t , . ; r ; ; , v; 

As .iilienaoned..above,.Ana^y^c J A%oridi^ 206 that can operate on a reduced 
or scaled data set, such as regression or clustering, the Data Reduction Utility 
Program 212 can be used. .The prpblqm of creating analytic models from massive 
amounts of. detailed data, has often been addressed by sampling, mainly because 
compute intensive algorithms cannot handle large vplumes of data. . The approach 
of the Data.Reduction Utility Program 212 is to reduce data thrpujgh operations 
such as -matrix calculations or histogram binnmg,. and. then us? this reduced or 
scaled data as input to a non-SQL algorithm, . This method intentional reduces 
fine numerical data details by assigning them to ranges, pr^ bins, correlating then- 
values 7 or; determining their cpyariances.fThjs capacjty of the preferred embodiment 
for creating these data structures frpm^m^shre ^mpuntt^data in parallel gives it a 
special opportunity in this area. ..-in''.-: : \- c-1 : ( ■> : v - 

"-• ■ 'r" 1 f> > »» >-"J- r - * *c .3 vV^ko? -?-. •.; (i sv-r.. t -.-i i' ' .ho"?. 

Analytic Logical Data Model - • "\ > : :. r ;y > i/t 

a. : The Analytic *LDM 2CWr which isr^tegra^ed with the relational database 116 
and the RDBMS 114, prbyideslogic^ entitj^ and aj^ribute definitions for advanced 
^iJanaaytftc.processmg;- i.e., theSealaple^Dat^ tyimng£\^&™n$£Q2znd Analytic 
Algorithms 206, performed by the RDBMS 114 directly against the relational 
database 116. These logical entity and attribute definitions comprise metadata that 
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define lie characteristics of data stored in the relational database 116, as well as 

metadata that determines how the RDBMS 114 performs the advanced analytic 

' processing. The Analytic LD1SI 200 also stores processing results from this 

advanced analytic processing,' which 1 includes both result tables and derived data for 

5 the Scalable Data Minin the Parallel 

" : Deployer2i6. The Analytic LDM itOW%-ty&^db^h l $^-^*o^ entities 

" ahd attributes defmitions ptoahaietbrization bfthe advanced 

' ' Analytic processing and ^ncfc^the AnalyffictDM 2Q0*te>updateai with *he results of 

the advanced analytic processing.' • :! '-' r ' £ ° ; i"<- = • * . w 

' •-. . ; ' .wu-.i-.-.i ji-e>,r.tci.?,t.gy:.q \ u; ! >i£~;: Q?. >\;.y V 

;: •' ' Logic ^dV^^^^^-rr^i^i w *i ' 

/ Flowcharts winch' ^ 
present inSrentfon are provided in FIGS. 3, 4 and 5.--T hose skilled in the art will 
recognize that this logic is provided For illustrative purposesionfy and; that different 
15 logic may be used to accomplish the same results. 

Referring to FIG. 3, this flowchart -illustrates the-logic of the Scalable Data 
Mining Functions 202 kccordirig^to Uie^referred embodiment of the present 
invention. 

Block 300 represents the' one or more of the Scalable Data Mining Functions 
20 202 being created via'the API 204: 2 This Wy^entaii,'for example, the instantiation 
of an object providing the f desired function. ' ' " - L • , = > .--s* '-.,.>. ■. 
''' ' ' Block 302 fepfes'entscer^nparam 

order to control the operation of the Scalable Data Mining Functions' 202; , 

Block 304 fepresehtsthe metadata in the Analytic LDM 200;being accessed, 
25 if necessary for Reoperation of the -Scalable Data'Mitfing Functionr202.^ ".» 
; " Block 306 represehts the API 204 generating a Scalable Data Mining. 

Function 204 in the form 1 of a data mining query basedon the passed parameters 
and optional metadata. ' ; . : .m , : 

Block 308 represents the Scalable Data Mining Function 204 being passed to 
30 the RDBMS 114 for execution. : J ^^\'. f ;.ir/ J( 

' " ' ! ' : deferring to FIG. 4; this' flowchart aiustratfes-the logic of the Analytic 
' Algorithms 206 acci>rdin J g t'o the preferred embodiment/of the present invention. 
•i< v j!: - ' : BloeM - ^ify¥e&^'&e Analytic Algorithms 206. being invoke^. cither 
•"direciryor viatfi^ Analytic Algorithm API 214: -', .;<•.,. 
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Block 402 represents certainiparameters being passed to the Analytic 
Algorithms 206, in order to control their operation. . >, ; .. . .. . 

• I . Block 404 Represents theimetadate in the Analytic- LDM 200being accessed, 
if necessary for 'the operation 6f-the Analytic Algorithms, 206. .., , , 

• , vi ; Block 4Q$*ept*sents. the. Apafer^iAJlgorithms 2Q6 passing SQL statements 
- : to-^-E©BMS-ll^'.forexecutioia and3lpdt.408 op^pjiajjy repre^e,ntji the Analytic 

Algorithrris 206 performing prpgra*nmatic . iteration; T>ose skin^d in the ; art will 
-recognize that.^ 

' may not include tothsteps^inay include, ad^jtioi^.st^,^d^v include 
iterations of these steps. 

. ! ■ -i :■ iTBldcfc 4 10: represents the Andytic ; Algp5ithms i 206 storing results in the 

• AndyticLDM 2O0y;~j^>h • ..'.r;.ut ; '[ ic ^. • b 

'/ Referring to FIG. 5,;this,floweJiart illustrates the logic performed by the 
RDBMS114 accordiiigjto the preiferreAembodiment of ,the present -indention. 
- klock 500 represents the.RPBMS lift receiving a query ox other SQL 

statements. .o*-;. '■: >:-•.:. .-. . .'; -• h- 

Block 502 represents the RDBMS 114 analyzing the query. 

Block 504 represents the RDBMS 114 generating a plan that enables the 

RDBMS 114 to retrieve the correct information from the relational database 116 to 

satisfy the query. 

Block 506 represents the RDBMS 114 compiling the plan into object code 
for more efficient execution by the RDBMS 114, although it could be interpreted 
rather than compiled. 

Block 508 represents the RDBMS 114 initiating execution of the plan. 

Block 510 represents the RDBMS 114 generating results from the execution 
of the plan. 

Block 512 represents the RDBMS 114 either storing the results in the 
Analytic LDM 200, or returning the results to the Analytic Algorithm 206, APPL 
110, and/or Client 118. 

CONCLUSION 
This concludes the description of the preferred embodiment of the 
invention. The following describes an alternative embodiment for accomplishing 
the same invention. Specifically, in an alternative embodiment, any type of 
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computer^ such as a miinframe, minicomputer, or personal computer, could be 
used to implement the present invention; r • - > r. • -< : : <. ' 

In suimnary,\he present invention discloses a method,: apparatus, and article 
of manufacture for ^'rfbraiiiig d^ti -bEflhirig-"iapiiKcaubns Jn .a: rielatibnal, database 
management systeM c At least 6ne -analytic ^ algorithm is performed; by a computer > 
directly ^ainSt a r^l^i^mal^datab^sei whefein the analytic algbrkhW includes SQL 
st^em^nts p^rnfedtf the relational database management :system andopftipnal 

"progr^^ *arid'the ^al^c^gbr&timtcreates at least one.an4ytic 

model within '^MUfutl<^<^ : dM nlodel frbm data residing in/the relational 
database. .*q:**\. .s<:t,\i: *\. a; 

~ ■ The foregoing ^ embodiment Jofithelinvention has 

been presented for the purposes of illustration and description, lit isjnpt intended to 
be exhaustive or to limit the ^ invention t?6 the precise form disclosed.* >Many 
modifications and variations are possible in iig^it of the; above teachings It is; t ; 
intended iiarthe scope of the invention be limited ^not by this detailed description, 
but rather by the claims appended hereto. ^ : u-. < 
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WHAT IS CLAIMED IS: ; •■. r. : 4 . ... . -i^ 

1. - A computer-implemented system for perfoirang data mim^g 

applications, comprising: ' . .V ..>.' . r . >! : -< . . .. , j v...:- 5 j ,. 

(a) a computer having one or more data storage devices connected thereto; 

(b) a relational database management system, executed by the computer, for 
: miariagirig a> MatJonal. database; stored on the data storage devices; and 

• 7 ?r - J -> (c) T at least one analytic, algorithm^perfo^ the 
analytic algorithm includes SQL statements performed by therejatipiial database 
management system directly against the relational database and optional 
; prdgrarhmatic^ one analytic 

modePwithin^an- analytic iogicalda^mo^et^™.;^^^,* 11 relational 
database. l\ ■. v ;-"-; ; :r .. . - y.» ■.- 

-= ; • .1 >2. ' ' The co'mputerrimplemented system of, claim 1, wherein the analytic 

algorithm provides statistical' and; machine learning methods for creating the 
:: 'anilyfic t lo]^c^kI*fa?model. : >h-\..y- -.».r. r r- ■ •/. i . < ,;;■> . . ,',\.. t -v, 

3. The computer-implemented system of claim 1, wherein the analytic 
algorithm is implemented in Extended ANSI SQL. 

•' '• r :"* J ■ -' : '- ! ' " ; --vy i ; .•-••'*<,•. - •■. «... • ..; t. i - ■ . - •, • 

4. The computer-implemented system of claim,3, yherein the analytic 
algorithm bp^ratesagainst a set of tables in the relational database, and the 
Extended ANSI SQL build relationships ampng data elements in the tables. 

-:a . H ! ,£5-5; n -The computer-implemented system of daim ^^y^erein. Ae^ ExtmcW 
ANSI SQL analyzes* the relationships to determine how the, relationship? change.. 

f • <" t 6. - ' - The coniputerTimpIemented system pf claim .1, wherein the analytic 
algorithm is implemented in a Call Level Interface (CLI) that processes data from 
the relational database using SQL and programmatic iteration. 

7. The computer-implemented system of claim 6, wherein the CLI is 
used with SQL to perform computations, aggregations, and/or ordering on the data 
from the relational database. 
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8. The computer-implemented system of /claim 1, wherein the analytic 
algoritlim is ifriplemented by a Data' ReductionUulity Program that reduces data 
from the relational database in bulk using SQL followed by, a non-SQL iterative 



program.." 



:r* YhV^6mputer-i^^ 
^^^•XjM^iogM&'f'^^ >s - SWp^eacerof iEscendcd^JSI^QL followed 
^pi^i^iifialit iteration:' .<*•?. 37 J.02 z^bvi-jp.1 tutiibnzU. x: J?n i: 

vi " io. - A methbd for "per-fbFkun o; 

(a) inaa^gVr^aoB^^f afeal*- koi^on ,«mf IojLTOOrftidS|ta-storage devices 

connected to a computer; and . * : ';: 

(b) performing at least one analytic algorithm in the computer, wherein the 
analytic algorithm includes SQL statemehts;performed by, a relational database 
management system directly againsrthe* relational databaseand pptional f 
programmatic iteration, and the analytic algorithm creates at f least one analytic 
model within an analytic logical data model from data residing in the relational 
database. " ! ' ■' '• " v; ~' ;v • < • 

11. An article of manufacture comprising logic embodying a method for 
performing data mining applications, comprising:, a- > ■ : ; • 

" (a) managing a relational; database stored on on^e or, mpre cUta storage devices 
c^nhectedto £ computer; land*'--'' ri'ir^^i-.nA h:i»v. )'":■'•: \t'*J- !;"•!.. r . 

(b) performing at least one analytic algorithm in the computer, wherein the 
:J analyttc^gorillimT*iauaiBS SQL statements performed by a relational database 
1 n^age&e^system directly against the*elational database and optional ; , 
programmatic iteration, and the analytic algorithm creates at least one analytic 
modefwi&iii ah -^yi^A^c^^tmk^'f^mdgU residing in the relational 
; -databasei ■ /!; ' ^' y: - '-- v - 1 ■■ '•■ io-: >u-:~i.. ■•• 
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